Transparency of COVID-19-related research: A meta-research study

Background We aimed to assess the adherence to five transparency practices (data availability, code availability, protocol registration and conflicts of interest (COI), and funding disclosures) from open access Coronavirus disease 2019 (COVID-19) related articles. Methods We searched and exported all open access COVID-19-related articles from PubMed-indexed journals in the Europe PubMed Central database published from January 2020 to June 9, 2022. With a validated and automated tool, we detected transparent practices of three paper types: research articles, randomized controlled trials (RCTs), and reviews. Basic journal- and article-related information were retrieved from the database. We used R for the descriptive analyses. Results The total number of articles was 258,678, of which we were able to retrieve full texts of 186,157 (72%) articles from the database Over half of the papers (55.7%, n = 103,732) were research articles, 10.9% (n = 20,229) were review articles, and less than one percent (n = 1,202) were RCTs. Approximately nine-tenths of articles (in all three paper types) had a statement to disclose COI. Funding disclosure (83.9%, confidence interval (CI): 81.7–85.8 95%) and protocol registration (53.5%, 95% CI: 50.7–56.3) were more frequent in RCTs than in reviews or research articles. Reviews shared data (2.5%, 95% CI: 2.3–2.8) and code (0.4%, 95% CI: 0.4–0.5) less frequently than RCTs or research articles. Articles published in 2022 had the highest adherence to all five transparency practices. Most of the reviews (62%) and research articles (58%) adhered to two transparency practices, whereas almost half of the RCTs (47%) adhered to three practices. There were journal- and publisher-related differences in all five practices, and articles that did not adhere to transparency practices were more likely published in lowest impact journals and were less likely cited. Conclusion While most articles were freely available and had a COI disclosure, adherence to other transparent practices was far from acceptable. A much stronger commitment to open science practices, particularly to protocol registration, data and code sharing, is needed from all stakeholders.


Results
The total number of articles was 258,678, of which we were able to retrieve full texts of 186,157 (72%) articles from the database Over half of the papers (55.7%, n = 103,732) were research articles, 10.9% (n = 20,229) were review articles, and less than one percent (n = 1,202) were RCTs. Approximately nine-tenths of articles (in all three paper types) had a statement to disclose COI. Funding disclosure (83.9%, confidence interval (CI): 81.7-85.8 95%) and protocol registration (53.5%, 95% CI: 50.7-56.3) were more frequent in RCTs than in reviews or research articles. Reviews shared data (2.5%, 95% CI: 2.3-2.8) and code (0.4%, 95% CI: 0.4-0.5) less frequently than RCTs or research articles. Articles published in 2022 had the highest adherence to all five transparency practices. Most of the reviews (62%) and research articles (58%) adhered to two transparency practices, whereas almost half of the RCTs (47%) adhered to three practices. There were journal-and publisher- related differences in all five practices, and articles that did not adhere to transparency practices were more likely published in lowest impact journals and were less likely cited.

Conclusion
While most articles were freely available and had a COI disclosure, adherence to other transparent practices was far from acceptable. A much stronger commitment to open science practices, particularly to protocol registration, data and code sharing, is needed from all stakeholders.

Background
Access to research publications, their underlying data and methods that enable the reuse and reproduction of the research are core features of open science. For instance, research data that are findable, accessible, interoperable, and reusable (FAIR principles) are expected to facilitate knowledge discovery, promoting collaboration across different research communities and advancing scientific research by reducing barriers to data sharing [1,2]. In this context, practices like protocol registration and disclosing conflict of interest (COI) funding help ensure scientific research integrity. Without research integrity, the credibility and reliability of scientific findings and underlying data and methods may be compromised. However, during the recent coronavirus disease 2019 (COVID-19) pandemic, some public members in developed countries have cited a lack of transparency in scientific studies used to justify public health measures [3]. Thus, international organizations like the Organisation for Economic Co-operation and Development (OECD) have stressed the importance of transparent communication with citizens to support public health measures and counter misinformation [4]. To facilitate rapid and collaborative scientific research, over 100 organizationsincluding universities, publishers, funders, and journals-signed a statement in January 2020 supporting unrestricted access to research data, tools, and other information related to COVID-19 [5].
The COVID-19 pandemic has also produced an enormous volume of scientific publications across various fields of study, leading to the development of vaccines and the evaluation of community interventions. While transparent scientific practices such as data and code sharing have helped combat the pandemic globally (e.g., by allowing information absorption via the established common data repositories and international and interdisciplinary collaborations) [6][7][8], the lack of transparency in some key developments and the rampant dis/misinformation (fake news, conspiracy theories) have contributed to public mistrust of research and public health measures [3]. This indicates a lack of awareness regarding the extent of transparency practices in COVID-19-related medical literature. As an example, it is unclear whether open science initiatives related to the COVID-19 pandemic [6,7] to advance data sharing have led to a higher number of articles sharing data in COVID-19 research compared to what has been seen in studies in general in biomedical literature [9].
We aimed to programmatically assess the adherence to transparent scientific practices (data sharing, code sharing, conflict of interest (COI), disclosure, funding disclosure, and protocol registration) from open access full text COVID-19-related articles published in PubMedindexed journals from the Europe PubMed Central (EPMC) database.

Methods
The protocol of this descriptive study was published beforehand on the Open Science Framework (OSF) website (https://osf.io/5kx2n). All code and data related to the study were shared via its OSF repository (https://osf.io/x3kb6) and GitHub (https://github.com/choxos/covidtransparency) at the time of submission of the manuscript. Deviations from the protocol are available in S1 Text.

Data sources and study selection
First, we searched for all open access PubMed-indexed records available in the EPMC database from 1/1/2020 to 9/6/2022. This database included all the records available through PubMed and PubMed Central and also enabled us to retrieve the record automatically, which is not available through the PubMed website. Then, we used the LitCovid database (https://ncbi.nlm. nih.gov/research/coronavirus) to detect COVID-19-related papers. LitCovid, sponsored by the National Library of Medicine, is a curated literature hub to track up-to-date COVID-19-related scientific information in PubMed. LitCovid is updated daily with newly identified relevant articles organized into curated categories. It uses machine learning and deep-learning algorithms [10,11]. We merged both datasets using the PubMed IDs (PMIDs) of the records.
We then downloaded the full text of all identified open access COVID-19-related available records in XML format using the metareadr package [12] from the EPMC database.
We used the EPMC publication type variable to detect research articles and reviews. We used the "research articles" filter of EPMC (in publication type) to identify papers that have used and analyzed data (of any kind) and to exclude opinions, commentaries, and letters. We combined two publication types to identify reviews: "review" and "systematic-review". As the EPMC's publication type is not comprehensive for randomized controlled trials (RCTs) and many of them are not labelled correctly, to detect RCTs, we used the Living OVerview of Evidence (L�OVE) platform (iloveevidence.com). L�OVE, powered by Epistemonikos Foundation, maps and organizes all of the best evidence in various medical and health sciences fields. It has a database for COVID-19-related papers in some categories, including RCTs. This database has been seen to be very comprehensive [13]. Its RCTs database includes both protocols and papers with the results. As we want to focus solely on RCTs that have been done, we applied two filters: "RCT" and "Reporting data," on the L�OVE website and then downloaded the dataset with these characteristics. We used PMIDs of COVID-19-related RCTs provided in this downloaded dataset from the L�OVE website to detect RCTs in our main dataset of all open access COVID-19-related papers.

Data extraction and synthesis
We assessed articles' adherence to five transparent practices: We used a validated and automated tool developed by Serghiou et al. [9] suitable to identify these five transparent practices from articles in XML format from the EPMC database. Briefly, this tool uses some keywords to identify adherence to each indicator using regular expressions.
For example, it searches for phrases commonly associated with a COI disclosure (e.g., "conflicts of interest," "competing interests," "Nothing to disclose.," etc.) in the body or titles of the sections of the text file of an article. For COI and funding disclosure, this tool only determines any mentions of disclosure, whether there was anything to disclose or not. For instance, a paper with a phrase such as "Nothing to disclose" was considered transparent, just like when there was something to disclose. For data and code sharing detection, it both detects shared as a supplement or shared in a general (e.g., figshare, OSF, GitHub, etc.) or field-specific repository (e.g., dbSNP, ProteomeXchange, GenomeRNAi, etc.) as adherence to transparency to data/code sharing. However, those articles that indicated "data available under request" would be classified as no data available since it is unlikely to obtain these data [14]. Overall, all the mentions of the indicators were classified as affirmative when the article explicitly contained them.
For validation of the identification of transparency practices in the sample articles, we manually checked the five transparency indicators from 100 random articles in the sample with methods described by Serghiou et al. [9] (available on OSF as covid_transparency_randomsample.csv or in S1 Appendix). Sensitivities and specificities of the tool for detecting each open science practice as provided by the developers were as follows: Basic journal-and article-related information (publisher, publication year, citations to article and journal name) were retrieved from the EPMC database. The impact index of the journals (JIFs) was obtained from the Journal Citation Reports 2021 (https://jcr.clarivate.com) and the SCImago Journal Rank 2020 (SJR) and H-index indicators from the SCImago website (https://scimagojr.com). We also calculated the proportion of articles available as open access full texts via the EPMC from the total number of COVID-19-related articles in the database.

Data analysis
We used R v4.1.2 [15] for searches, data handling, analysis and reporting. The searches and data export from the EPMC were conducted with the europepmc package [16]. Indicators of transparency practices from the available full texts were extracted with the rtransparent package [17]. Trends over time in transparency practices were reported using descriptive tabulations and visualizations using the ggplot2 package [18]. We used the sensitivity and specificity of the rtransparent package [9] to generate 95% CIs for our prevalence estimates of the transparency practices with the epiR package [19]. We determined the level of adherence to transparency practices in articles, ranging from 0 to 5 practices. Generalized linear models with logit link were used to analyze transparency indicators by JIF, SJR, and H-index or received citations in research articles. These models were adjusted for the month-year of publication and whether the article was RCT. The Fisher's exact test with Monte-Carlo simulated p-values for differences in transparency practices by journals and publishers were performed. We reported the interquartile range (the third quartile (Q3)-the first quartile (Q1)) and median (instead of mean and standard deviation) when the data were skewed.

General characteristics
As of June 9, 2022, there were 258,678 COVID-19-related articles, including open access and non-open access publications. Of those, full texts of 186,279 (72.0%) articles were accessible via the EPMC. However, 122 (0.1%) of these articles were not downloadable because of technical issues and were excluded from our analyses. Consequently, the sample included 186,157 full text articles. Fig 1 shows the Venn diagram of the study.

Transparency practices
We found that 91,776 (88.5%) of the research articles had a statement to disclose COI, and the prevalence was higher in RCTs (92.6%) and reviews (91.9%). Funding disclosures were detected in 76,481 (73.7%) research articles, more frequently in RCTs (83.9%) and less

PLOS ONE
Transparency of COVID-19-related research frequently in reviews (69.0%). One in 25 research articles was registered beforehand (n = 4,362), and the proportion of registered articles was manifold higher in RCTs (53.5%) than in reviews (4.9%) or research articles (4.2%). About one in ten research articles (n = 11,599, 11.2%) and RCTs (11.7%) shared data, whereas this proportion was four times lower in reviews (2.5%). One in 25 research articles shared code (n = 4,192, 4%). The proportion was lower in RCTs (0.7%) and reviews (0.4%) than in research articles. More information is illustrated in the left-hand bar charts in Fig 3 and Table 1.
Research articles and reviews published in 2022 adhered the most to all five transparency practices (Table 1, Fig 3-right). We did not find a clear time trend for RCTs (Fig 3B-right). Adherence to the five transparency practices was identified in <0.1% of articles. Adherence to three practices was detected in 47% of RCTs and two practices in 62% of reviews (Table 2).
There were journal and publisher-related differences in all five practices (Tables 3 and 4). All the papers from MDPI and PLoS had COI and funding disclosures. Whereas COI and funding disclosures were available for almost all the RCTs published in the New England Journal of Medicine and The Lancet, they rarely shared their data (0% and 2.9%, respectively). The highest percentage of data sharing was found in PLoS One research articles (54.8%) and RCTs (70.0%). RCTs published in The Lancet and The Lancet Respiratory Medicine had the highest adherence to protocol registration, with 94.1% and 100%, respectively. Code sharing was a rare practice, with Scientific Reports (14.2%) and PLoS (11.1%) being the top journal and the top publisher for adherence to this practice. None of the journals with the highest number of RCTs in the sample shared their codes.
Research articles in the lowest JIF or SJR quintiles were least likely to adhere to the transparency practices. Regarding protocol registration, the differences were the smallest between the JIF or SJR quintiles (Fig 4A and 4C). The more citations articles received, the more likely the article adhered to the transparency practices (Fig 4B). Detailed information for RCTs and reviews is available in S1-S4 Tables.

Discussion
Our study showed that adherence to transparent practices increased in COVID-19-related medical literature from 2020 to 2022. Adherence to reporting COI disclosure was high throughout the study period. In addition, most articles had funding disclosure. Data sharing, code sharing and protocol registration were rare but improved little over the study period. Higher adherence to COI disclosure and protocol registration was seen in randomized trials and reviews than in other research articles. Most research papers and reviews adhered to two or fewer indicators, whereas most RCTs adhered to three or fewer indicators. Only 53 papers adhered to all indicators overall. Journal-and publisher-related differences in transparency practices were clear. While some journals and publishers were completely transparent regarding COI and funding disclosures, they performed poorly for the other three indicators. In addition, journals with the lowest JIFs and SJRs seemed to publish less transparent articles. Articles which received more citations were more likely to have adhered to transparency practices than articles with fewer citations. In general, adherence to transparent practices, except for COI disclosure, was at a similar level in COVID-19-related literature than in other biomedical literature analyzed with the same methods (see Fig 3 in [9] and also [20,21]). This is surprising, particularly when considering worldwide, remarkable and noble initiatives to enhance open science to tackle the pandemic. As early as January 2020, over 100 organizations, including journals, publishers, funders, universities, and other institutions, signed a statement to ensure free access to research data, tools, and other information related to COVID-19 [5,7]. Later, other initiatives to support the goal emerged, for instance, the COVID-19 Open Research Dataset (CORD-19) [22], a free resource of over 280,000 articles about the COVID-19 virus. However, it is possible that the algorithms used here did not efficiently detect all the different ways of sharing data and material that emerged after the pandemic because the algorithms were validated before the pandemic [9]. Furthermore, any initiative and movement need time to be effective. Another reason could be that publishers and journals should implement these changes.
Lacking protocol registration (for RCTs and reviews in particular), code and data sharing, and COI disclosure got attention during the pandemic [6,23,24]. However, we are unaware of the investigation of transparency in these aspects in COVID-19-related research on this scale in medical research. Comparing our study to other studies on the transparency of COVID-

PLOS ONE
Transparency of COVID-19-related research 19-related research is difficult due to the different methodologies and the scale of our research. Nevertheless, it seems that data sharing was a little less common in our sample than the proportion detected at the beginning of the pandemic measured by Lucas-Dominguez et al. in PubMed Central [25]. In addition, the proportion of studies that shared data in our study Table 3. Transparency practices in the five most common journals in the sample that published research articles, randomized controlled trials (RCTs), and reviews (%).

PLOS ONE
Transparency of COVID-19-related research sample was lower than in COVID-19-related preprints shared via medRxiv and bioRxiv [26]. While some research articles may not be able to share their data due to constraints such as privacy or legal obligations, at least most of them could have shared their metadata, that is, "descriptive information about the context, quality and condition, or characteristics of the data" including "e.g., the data captured automatically by machines that generate data such as

PLOS ONE
Transparency of COVID-19-related research DICOM information for image files" [27]. On the other hand, a recent study of 200 articles showed that adherence to COI and funding disclosures improved from preprints to peerreviewed publications and showed higher adherence to funding disclosures than our findings indicated [28]. In addition, adherence to transparency practices was higher than in our previous study on COVID-19-related research in dental journals [21].

PLOS ONE
Transparency of COVID-19-related research It has been highlighted that transparency practices are associated with higher impact, that is, citations [29]. Even though meta-research findings have not always been that clear [9]. For instance, in the study of Serghiou et al. [9] with the same methodology as here, COI disclosure was associated with a slightly lower number of citations (7 IQR: 3-18 vs. 6 IQR: [2][3][4][5][6][7][8][9][10][11][12][13][14]. According to our findings, the relationship between the number of received citations and transparency indicators was stronger and more homogenous than the relationship between journal impact/rank/H-index and transparency indicators, and stronger than that found by Serghiou et al. [9]. The relationships between transparency practices and the Transparency and Openness (TOP) Factor of journals could be explored in the future (https://topfactor.org/). Even though we used representative data and validated methods to investigate transparency in COVID-19-related research, our study has some weaknesses. First, with the applied methods, we could not accurately distinguish all the studies required to register their protocol, e.g., distinguishing meta-analyses from narrative reviews or studies that did not produce any data or code to share. So this means that even though every researcher would have adhered to all five practices, we would not likely have achieved 100% adherence rates (e.g., because not every study uses any statistical procedures). Also, the study sample was restricted to open access articles in the EPMC database, which may not correspond to all COVID-19-related studies published in medical journals. At least, in general, the differences in transparency practices between open access and non-open access articles from EPMC are similar [30]. However, as most COVID-19-related articles were available via EPMC (73%), this does not likely diminish the strength of our interpretations considerably. In addition, our analyses were restricted to the published articles. Thus, we did not evaluate the material provided or shared during peer review, which may have been more comprehensive than what ended up in the published article.
Transparency is crucial to ensure the credibility of science and enable its assessment [23,24]. Transparent scientific practices, like the ones we investigated here, have helped to fight the pandemic globally [6,7]. While most COVID-19-related articles were open access and adhered to disclosing funding and COI, our findings showed suboptimal adherence to data, code sharing, and protocol registration. A stronger and more concrete commitment to open science practices, particularly to protocol registration, data, and code sharing, is needed from all stakeholders. societies and their people would be the beneficiary [23].